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ABSTRACT 

Nearly a decade since the completion of tlie first draft of 
the human genome, the biomedical community is 
positioned to usher in a new era of scientific inquiry that 
links fundamental biological insights with clinical 
knowledge. Accordingly, holistic approaches are needed 
to develop and assess hypotheses that incorporate 
genotypic, phenotypic, and environmental knowledge. 
This perspective presents translational bioinformatics as 
a discipline that builds on the successes of bioinformatics 
and health informatics for the study of complex diseases. 
The eady successes of translational bioinformatics are 
indicative of the potential to achieve the promise of the 
Human Genome Project for gaining deeper insights to the 
genetic underpinnings of disease and progress toward 
the development of a new generation of therapies. 
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INTRODUCTION 

The study of complex diseases requires the effective 
integration and analysis of disparate features that 
originate from genotypic, phenotypic, and environ- 
mental sources. In contrast to microscopic 
approaches that focus on detailed analyses of a single 
data type, a macroscopic approach offers a holistic 
view for exploring systems of relationships.^ Mean- 
ingful insights from a systems theory approach 
require the coalescence of many often intractable, 
heterogeneous data types.^ Traditionally, biomedical 
informatics innovations have focused ('microscopi- 
cally') on innovations constrained to particular 
domains^ (eg, clinical innovations in health infor- 
matics; biological innovations in bioinformatics). 
This has led to a perceived gulf betvv'een bioinfor- 
matics and health informatics, thus decreasing the 
potential impact of a 'macroscopic' approach. 
Recent years have seen recognition of the growing 
need to bridge these domains through the develop- 
ment of trans-disciplinary training programs and 
curricula^ as well as venues specifically designed to 
share innovations that span the laboratory and 
clinical spaces (eg, the AMIA Summit on Trans- 
lational Bioinformatics). Translational bioinfor- 
matics (TBI) has thus emerged as a systems theory 
approach to bridge the biological and clinical divide 
through a combination of innovations and resources 
across the entire spectrum of biomedical infor- 
matics.^ Along with complementary areas of 
emphasis, such as those focused on developing 
systems and approaches within clinical research 
contexts,* insights from TBI may enable a new 
paradigm for the study and treatment of disease. 

The rapid escalation of activity in TBI can be 
attributed to parallel advancements in the biological 



and clinical realms. In biology we have seen 
unprecedented advances in technology, such as 
those associated with generation of molecular 
sequences.^ In healthcare, we are observing a new 
era of clinical data acquisition and decision support 
that is driven by Federal legislation fostering adop- 
tion of electronic health records and enablement of 
seamless exchange of health information.® ' The 
challenges have been paralleled in the biological and 
clinical realms, where there are common challenges 
in heterogeneous data integration, missing data, and 
semantic mapping. Nonetheless, opportunities to 
develop linkages between genetic and clinical 
information are also increasing as a result of 
participatory initiatives, such as those promoted by 
some direct-to-consumer genetic test vendors."^" 
Furthermore, there is great opportunity to leverage 
complementary approaches to address these 
common challenges (eg, some of the tools developed 
by clinical research informatics researchers*). 

The promise of the $2.7 billion Human Genome 
Project was to enable scientists to understand the 
genetic basis of human disease. However, nearly 
a decade since the completion of the first draft of 
the human genome, there is still much to be 
elucidated. Through technological and computa- 
tional advances, the SI 000 genome is becoming 
a very real possibility.^'' The availability of a large 
number of complete human genomes with clinical, 
phenotype, and environmental information may 
enable a new paradigm for the development of new 
sets of hypotheses pertaining to complex diseases, 
such as those that involve multiple genes and 
environmental parameters.^'' A major goal of TBI is 
thus to develop informatics approaches for linking 
across traditionally disparate data and knowledge 
sources enabling both the generation and testing 
of new hypotheses. As large volumes of linked 
biological and clinical data become available, the 
complexity of disease may be dissected using novel 
TBI approaches designed in silico, but validated in 
traditional in vitro or even in vivo interventions. 

BUILDING ON PREVIOUS SUCCESSES 

TBI is built on the successes of research that have 
evolved in the 30 years since the first use** of the 
term 'bioinformatics.' Four notable areas germane 
to the present discourse are clinical genomics, 
genomic medicine, pharmacogenomics, and genetic 
epidemiology (figure 1). The acceptance of clinical 
genomics (which has the purpose of identifying 
clinically relevant molecular biomarkers) by 
the clinical community can be measured by 
the growing number of clinically relevant genetic 
tests. Genomic medicine, or 'personalized 
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Figure 1 Bridging biological and 
clinical knowledge using translational 
bioinformatics. Bioinformatics 
approaches, focused on areas from 
molecules to populations (eg, clinical 
genomics, genomic medicine 
('personalized medicine'), 
pharmacogenomics, and genetic 
epidemiology), form the foundation of 
approaches that are used by 
translational bioinformatics (TBI; large 
bidirectional arrow). TBI thus bridges 
knowledge acquired from both the 
biological (using bioinformatics) and 
health (using health informatics) 
domains. Accordingly, the success of 
TBI will result in the crossing of the T1 
translational barrier, and thus link 
innovations from bench to bedside. 
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medicine,' (which aims to identify genotype— phenotype corre- 
lations relevant to individuals, or haplotype variation) is posi- 
tioned to uncover large-scale genotype— phenotype associations 
as a result of genome-wide testing and increased resolution of 
representation of clinical data. Pharmacogenomics may also 
benefit from ascertaining correlations with data captured for 
clinical purposes (eg, such as captured in electronic health 
records). For example, it may enable correlation of genomic 
measurements with clinical phenotypes observed relative to 
pharmacological substances (eg, as listed in the Pharmacoge- 
nomics Knowledge Base (PharmGKB)^^). It may also potentially 
provide patient-specific prescribing advice through decision 
support systems. Finally, genetic epidemiology is rising to new 
levels with the aggregation of genome-based data alongside 
public health and environmental registries (eg, such as cataloged 
in HuGENet^'). Collectively, these sub-disciplines of bioinfor- 
matics have been suggested as core to the integration of 
biological and health data.^° However, the mere availability of 
observations or statistically significant associations is of little 
practical value without explanations of potential clinical utility. 
This challenge of finding true biomedical explanations has been 
reflected before in medicine, for example, when improved 
methods for acquiring physiological data were developed.^' 

The ability to sequence a patient's genome as routinely as 
other routine clinical laboratory tests is no longer a far-fetched 
possibility.^^ Accordingly, the sheer volume of potentially 
available data poses significant challenges for their integration in 
a form that can be used to either test current hypotheses or 
develop new ones. The heterogeneity of data suggests the need 
for new multi-dimensional paradigms for knowledge integra- 
tion, requiring a deeper understanding of biology than previ- 
ously required by informatics practitioners. Should one only 
consider single nucleotide polymorphic markers, or also include 
intronic (non-coding DNA) regions that have been shown to 
participate in gene regulation? Can gene expression measure- 
ments capture the effects of the environment ? How do we then 
integrate relevant biological data, such as from proteomic 
studies, and correlate them with fidelity to phenotype data to 
track subtle, but essential, environmental phenomena? Parallel 
to the difficulty in addressing these queries there will be signif- 
icant ethical, legal, and social implication issues to consider 

At the core of TBI is the development of new hypotheses 
originating from the integration of genomic and clinical data. 
TBI reflects a new era of trans-disciplinary science, and reflects 



the needed unification of multi-scale biological and clinical 
information for enabling the formal postulation of a deeper 
understanding of disease such as originally proposed by Blois^^ 
and more recently by Kalet.^^ Understanding the genomic 
influences on the complex evolution of disease, the impact of 
therapeutic approaches as can be measured by molecular 
biomarkers, and the overall consistency of genotype— phenotype— 
environmental correlations across populations forms the basis of 
focus for the TBI community. 

CHALLENGES IN STUDYING COMPLEX DISEASES 

Understanding complex diseases toward the development and 
assessment of putative therapies requires traversing between the 
bench and bedside, often referred to as the 'Tl translational 
barrier'^^ As a goal, the objective is uncomplicated — to 
ascertain how basic science observations can be applied to clin- 
ical contexts, either in the form of prognostic, diagnostic, or 
therapeutic approaches to disease. As an endeavor, it represents 
a grand challenge in modern medicine and also a potential 
paradigm shift for how to integrate a broad set of data points. 

The high dimensionality of potential data types when 
considering the full array of biological and clinical data that can 
be generated dwarfs any previous attempt at heterogeneous data 
integration. There is therefore a need to develop the next 
generation of clinical decision support systems that can incor- 
porate data from massive biological datasets that will need to be 
combined with relevant disease phenotype information and 
computable knowledge bases to offer clinically useful sugges- 
tions. Perhaps more mundane, but of equal significance, is the 
need to develop approaches that can accommodate a dizzying 
set of file formats and representation standards. These are not, 
by themselves, completely new challenges to the biomedical 
informatics community. Nonetheless, they reflect a core area of 
emphasis where energy is needed to integrate knowledge across 
clinical genomics, genomic medicine, pharmacogenomics, and 
genetic epidemiology in light of the avalanche of additional 
genomic and clinical data and the corresponding knowledge of 
inter- relationships. 

Amidst the challenges of knowledge integration and handling 
unprecedented volumes of data, TBI is greatly challenged with 
developing approaches that can bridge biological knowledge and 
place it into a meaningful clinical context. The volume of data 
can lead to spurious correlations that may be an artifact of the 
data and neither biologically nor clinically insightful. For 
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example, if a physician had access to a patient's entire genome, 
how could it be leveraged to provide clinically insightful 
knowledge that would not have been possible using solely data 
already in a medical chart (eg, family history of a disease) ? As 
shown for the genomic era's 'Patient 0,' it is plausible to inte- 
grate genomic data with relevant clinical data to develop prog- 
nostic approaches. The potential to provide appropriate care 
with respect to predicted disease outcome or efficacy of thera- 
peutics offers great incentive for developing TBI approaches that 
integrate the full complement of biological, clinical, and envi- 
ronmental data. For this reason, phenotypic annotation of 
samples whose gene expression or single nucleotide polymorphic 
information is available in genomic data repositories such as 
GEO^® and dbGAP^' is underway in different laboratories,^" 
involving methodologies that are widely used in health infor- 
matics (eg, natural language processing, ontology mapping). 
Finally, approaches such as those implemented by the Crimson 
system^^ hold promise for capitalizing on the clinical data that 
are captured as an artifact of standard clinical care. The extent to 
which this type of relatively noisy data can be used for research 
is still the object of active research by the TBI community. 

Projects that involve TBI approaches to integrate biological 
and clinical data are already underway. The NIH-funded 
eMERGE (Electronic Medical Records and Genomics) project is 
a multi-site endeavor exploring issues involved with linking 
genomic information (from genome-wide association studies) 
with clinical data for individuals with specific conditions. 
Other efforts such as the Personal Genome Project,^^ the Exome 
Project,^^ the Million Veteran Program,^® and the 1000 Genomes 
Project^^ reflect the increasing interest of the biomedical research 
and clinical communities in studying the complexity of geno- 
type— phenotype relationships as well as postulating hypotheses 
for disease that incorporate genomic data. In addition to human- 
based genome projects, there are also initiatives such as the 
Human Microbiome Project (HMP^^) and Metagenomics of the 
Human Intestinal Tract (MetaHIT^') that strive to provide 
a census of commensal microbial flora potentially related to 
disease.'"' 

THE EMERGING TBI TOOLBOX 

The relationship between bioinformatics and health informatics, 
while conceptually related under the umbrella of biomedical 
informatics,^'^ has not always been very clear. The TBI 
community is specifically motivated with the development of 
approaches to identify linkages between fundamental biological 
and clinical information. As technological advances continue to 
produce data that enhance our ability to further understand the 
biological underpinnings of complex diseases,"*^ the clinical 
community will depend on the development of approaches to 
interpret these data such that they can be clinically actionable. 

TBI approaches are emerging as a melding of a complemen- 
tary suite of techniques that strive to meet this need. Network 
approaches'*^ have led to the development of new techniques to 
study drug— target''^ and gene— disease relationships'*'^ as well as 
to provide a deeper understanding of the human metabolism.''^ 
Techniques have also been developed to combine genomic and 
public datasets for studying allelic variation at the population 
level.''* Systems biology approaches have been used to identify 
genomic signatures that correlate with the potential efficacy of 
vaccines.''^ Finally, high-throughput sequence based approaches 
are showing promise for the identification of prognostic genetic 
markers for increasing numbers of rare diseases. ''^^^'' As the 
results of these early successes suggest, the TBI community is 
beginning to work closely with biomedical scientists to develop 



a new cadre of approaches to study the complex relationships 
between genotypic, phenotypic, and environmental data. 
Building on these endeavors will bring us closer than ever before 
to an entirely new generation of prognostic tests and highly 
effective and personalized clinical interventions. 

CONCLUSION 

The decade following the completion of the first draft of the 
human genome has witnessed unprecedented technological 
advancements that have led to the increasing prominence and 
importance of bioinformatics and health informatics for biology 
and healthcare, respectively. The exponential growth of genomic 
data, along with parallel achievements in acquiring and 
analyzing clinical data position the biomedical research enter- 
prise to deliver on the promise of the Human Genome Project. 
TBI is accordingly positioned to enable a systems view of 
complex disease. 
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